Reprinted from ELECTRONIC DESIGN December 1993 

Breaking the I/O 
Bottleneck 

■ DAVE DANNENBERG ■ Intel Corp., 5000 W. Chandler Blvd., Chandler, AZ 
85226; (602)554-2429. 



1 While backplane bus speeds have Increased from the ISA's 10-Mbytes/s to almost 150 Mbytes/s with 
the PCI, the disk-to-adapter channel hasn't been able to keep pace, creating an I/O bottleneck. 




Growing numbers of 
desktop PCs are connected to file 
and application servers in client/ 
server configurations, in an environ- 
ment in which PCs, networks, and 
servers form a web of distributed 
and tightly coupled computing pow- 
er. This type of architecture relies on 
the smooth flow of large amounts of 
data to make it work. As communica- 
tions are improved in one part of the 
system, bottlenecks are likely to ap- 
pear elsewhere. For example, in to- 
day's systems, a data bottleneck is 
emerging at the desktop and server 
network connection, but an even 
tighter bottleneck has appeared in 
the PC server's disk I/O. 

Data transferred from a PC 
server typically traverses two se- 
quential paths before reaching the 
server's host memory: a disk-to- 
idapter channel, such as the Small 
Computer System Interface (SCSI); 
and the adapter-to-host backplane 
bus, such as EISA or the faster Pe- 
ripheral Communication Interface 
(PCI). As backplane bus speeds in 
the PC servers leap to the speeds of 
PCI, the disk-to-adapter channel be- 
gins to limit performance. {Fig. 1) 

High performance 32-bit micro- 
processors, such as Intel's i960 pro- 
cessor, have the intelligence and 
data-handling agility to relieve the 
disk I/O bottleneck. The processor 
possesses facilities that attack three 
separate facets of the problem: 

• Multichannel communication, as 
implemented in Redundant Array of 
Inexpensive Disks (RAID) 

• Large, fast disk caches at the 
adapter, bypassing the disk-to- 
adapter data paths 

• Large, fast disk caches at the 
adapter, bypassing the disk-to- 
adapter channel altogether for many 
accesses. 

The external bus design of the 
i960 processor optimizes memory 
md peripheral data transfers, mak- 
ing it an optimal data-transfer en- 
gine for high-bandwidth SCSI traffic 
and concurrent transfer across a 



backplane bus, such as PCI. The pro- 
cessor's external bus design pro- 
vides the capability for highly tuned 
interfaces to external memory and 
peripherals, without incurring the 
cost and complexity typical in de- 
signing with 32-bit microprocessors. 
The processor also uses a program- 
mable bus-interface unit that inter- 
faces to moderate to slow-speed peri- 
pherals as well as high-speed DRAM 
with little or no interface logic. 

The i960 processor's bus control- 
ler is configured differently for in- 
terfaces with each of the four exter- 
nal subsystems that comprise the in- 
telligent disk adapter (Fig. 2): 

• The main memory subsystem, for 
disk cache, program code, and pro- 
gram data. 

• The disk adapter BIOS used to ini- 
tialize the adapter. 

• One or more disk controllers, such 
as SCSI controllers. 

• The interface to the host memory, 
such as a PCI backplane bus. 

The bus interface is configured 



according to memory region (or 
range of memory addresses). Each 
memory region is programmed to 
match the characteristics of the ex- 
ternal memory subsystem. 

The main memory subsystem is 
the most complex component of the 
server adapter design. DRAM is eco- 
nomically the best choice to imple- 
ment the large cache memory for the 
server adapter. Larger caches trans- 
late directly to higher 1/ data rates 
from the server. Caching controllers 
often support up to 64 Mbytes of on- 
card disk cache. With implementa- 
tion of large caches, cached ac- 
cesses may be limited to the speed 
of the DRAM subsystem. 

To optimize the DRAM inter- 
face, access-interleaving techniques 
can be used to obtain the highest 
performance. The most efficient 
DRAM interface requires external 
logic to generate row and column 
address strobes, as well as logic to 
multiplex the address. In the case 
of an inter-leaved design, external 



Copyright © 1 993 by Penton Publishing, Inc., 
Cleveland, Ohio 44114 



logic is required to demultiplex the 
data path. The high-performance 
DRAM sub-systems take advant- 
age of fast page-access capabilities 
of DRAM. In this access mode, 
blocks of contiguous data can be 
transferred rapidly. 

In a DRAM subsystem used for 
disk caching, the DRAM is mapped 
into two regions to exploit fast page- 
mode accesses. One region is config- 
ured for normal burst accesses to the 
DRAM; the other is optimized for 
contiguous extended burst trans- 
fers. In an extended burst mode, the 
DRAM transfer rate approaches its 
theoretical maximum (Table 1). Most 
importantly, the transfer rate is 
close to the rate of the fast backplane 
bus, such as the PCTs 132 Mbytes/s. 
With the extended burst mode, disk 
transfers from the cache exploit 
nearly the full bandwidth of the 
backplane bus interface. 

A simple 8-bit ROM or EPROM 
is used to implement the adapter 
BIOS. The i960 processor's BIOS 
memory region is configured for 8- 
bit-wide transfers. In this transfer 
mode, the processor provides the 
lower two-byte address lines. The 
logic that's used to decode the low 
address bits externally isn't needed. 
The BIOS memory region is pro- 
grammed for simple non-burst ac- 
cesses consistent with ROM reads. 
The programming eliminates exter- 
nal state machines that otherwise 
"unburst" the burst access for the 
ROM's benefit. In addition, the BIOS 
region is set up with delay cycles, re- 
ferred to as wait states. The wait 
states accommodate the slow access 
characteristics of ROMs. Again, the 
programming eliminates external 
logic that would be required for 
memory wait-state control. 

Similarly, the region for the 
disk-controller devices is configured 
to match the peripheral device char- 
acteristics. Besides the characteris- 
tics mentioned for the BIOS inter- 
face, the disk-controller region can 
be set as non-cacheable to ensure co- 
herency with the i960 processor's on- 
chip data cache. This eliminates ex- 
ternal logic that otherwise is needed 
to detect non-cacheable memory. 

The i960 processor's on-chip 
DMA controller serves as an ideal in- 
terface to the disk controllers. The 



Memory map 



Disk controllers 



Burst DRAM 



Extended burst DRAM 
(disk cache) 



16-bit 

non-burst 

non-cacheable 



32-bit 
burst 

wait states 



32-bit 
with no 
wait states 

8-bit 
with wait 



i960 



Main memory 
Disk cache 



2. 



1 



1 



Address/data/control/DMA 



Disk 
controllers 



T 



Adapter 
BIOS 



Backplane interface 



The i960 processor's bus controller is configured differently as an interface for each of the four ex- 
ternal subsystems: main memory, adapter BIOS, backplane interface, and the disk controllers. 



DMA controller provides an autono- 
mous control over these devices. The 
DMA interface is suited for the 
adapter's multitasking environ- 
ments to handle simultaneous cach- 
ing operations or host memory trans- 
fers concurrently with data trans- 
fers from the disk. 

The server adapter's final com- 
ponent is the backplane interface. 
The PCI bus closely resembles an 
i960 processor burst-mode bus. The 
bridge logic to the PCI can be imple- 



mented relatively simply. The back- 
plane interface is probably imple- 
mented as a secondary bus master 
from the i960 processor's perspec- 
tive. In this way, the interface can 
use the i960's low-latency bus-mas- 
ter capabilities to perform direct 
transfers from cache to host memo- 
ry. Alternatively, the i960 can imple- 
ment these directly by using fly-by 
DMA transfers, using the backplane 
interface as the source or destination 
port for DMA transfers, UJ 







i? m \\i\m 




FROM MAIN MEMORY 




Transfer bandwidth (Mbytes/s @ 33-MHz s 


ystem bus) 


Bus configuratio 


70-ns DRAM 
ii (non-interleaved) 


70-ns DRAM 
(2-way interleaved) 


20-nsSRAM 


Burst mode 


48.5 


66.7 


133 


Extended 
burst mode 


59 


107 


N/A 



